DOCUMENT RESUME 



ED 305 648 



CS 211 761 



AUTHOR 
TITLE 



PUB DATE 
NOTE 



PUB TYPE 



Baldwin, Janet 

Writing Skills of Graduating High School Seniors and 
Adult High School Non-Completers: A Study of Factor 
Structure Invariance. 
28 Mar 89 

24p-; rzper presented at the Annual Meeting of the 
National Council on Measurement in Education (San 
Francisco, CA, March 28, 1989). 
Speeches/Conference Papers (150) — Reports - 
Research/Technical (143) 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



MFOl/PCOl Plus Postage. 

*Adult Students; Dropouts; High School Equivalency 
Programs; High Schools; High School Seniors; Models; 
^Structural Analysis (Linguistics); Writing 
Instruction; Writing Research; Writing Skills 
^Confirmatory Factor Analysis; Factor Invariance; 
^General Educational Development Tests; Linear 
Relationships; Writing Skills Assessment Test 



ABSTRACT 

A study examined factor structure invariance among 
the writing skills of graduating High School Seniors and Adult High 
School Non-Completers. The study had three purposes: (l) to use 
LISREL confirmatory factor ?.nalysis (CFA) procedures to specify and 
test a series of factor models based on the test specifications 

intent of the General Educational Development (GED) Writing Skills 
Test; (2) to evaluate the fit of these models to a set of test data 
obtained from a national sample of graduating high school seniors; 
and (3) to test the iuvariance rf the best fitting factor structure 
model in both the seniors and the adult high school non-completers. 
Subjects were 2,532 high school seniors who took the anchor foru of 
the Writing Skills Test in the spring of 1987 and 698 adult high 
school non-completers between the ages of 17 and 19. Results 
indicated the nature of writing skill measured by this test to be a 
single construct reflecting generalized proofreading/editing skills. 
No support was found for the view that separate item- type methods 
factors accounted for variability of performance on the 
multiple-choice portion of the GED test. This study provided an 
empirical field-based illustration of the use of CFA procedures to 
evaluate the factor structure of multiple-trait, multiple-method data 
and to test for the invariance of plausible measureir-ent models over 
multiple groups. (Two tables of data are included, and 24 references 
Had a list of variables are attached.) (RAE) 



* Reproductions supplied by EDRS are the best that can be mad^ 

* from the original document. 



Writing Skills of Graduating High School Seniors 
and Adult High School Non-Conplc^" ^rs: 
A Study of Factor Structure Invariance 



Janet Baldwin 
American Council on Education, Washington, D.C. 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



Presented at the Annual Meeting of the 
National Council on Measurement in Education 
in San Francisco on March 28, 1989. 



I 



Writing Skills o£ GraSoatingr Hi^^ SdbodL Senixirs 
and Mult ffi.^ School Nan Completers: 
A Study ode B^cbar Structure Invariaixje 

by 

Janet Balc3win 
American Ciouncil on Bducaticxi 

Various thaories about the nature of writing skill address such 

qiiestic^ as \*iether writing skill is a single generalisable construct or 

ccniprised of discrete multiple skills; whether writing performance is 

inf luQicea by the nature of the task, assigranent, or question fonrat 

vMch elicits such performance; or whether writing skills are context 

tesed, influenced by the various subject matter domains within which 

writing may be assessed. Because soie writers have argued that "there is 

al^yays more than one major factor underlying any test data in the 

achievement daiHin. . . . even vAieri achieve^nent in a specific topic is 

being measured" (Birenbaum & Tatsuoka, 1982; p. 259), theories about test 

data in the dcroain of writing achiev^ement also may address the 

multidimensionality of writing achievonent test data. Because the Tests 

of General Educational Development are designed to measure the major and 

lasting outccmes of a four year high school program of study, the GED 

Writing Skills Test is intended to assess generalized writing skills 

\*iicih correspcxid to vAiat graduating high school seniors are required to 

know and to do. Although it may be reasonable to expect a test of 

generalized knowledge and skills to be unidimensicMial (Phillips & 

Mehrois, 1987), assumptions about what the test meastices should also be 

examined empirically. Performance on a multiple-choice writing test 

which includes items in multiple content or skill areas should be 



ERIC 



1 

3 



evaluated to determine whether the test ac±ually measures distinct 
multiple skills or, rather, whether it indicates a moire generalized set 
of editing or tj^oof reading capabilities. lixamining whc.\t the test 
measures in the various populations for \*iich it is intended not only 
improves the interpretability of the test scores but increases knowledge 
about the nature of writing skills in different groups. 

Althou^ the test specifications for the GED 'v«riting Skills Test are 
used prinarily as a blueprint for test assembly, the classification of 
itans in the test specif icatiais may also be used to provide a conceptual 
basis for evaluating the nature of what the test actually measures. For 
example, item type classfications from the test specif icaticais of the 
SAT-V and SAT-M subtests have been used as a basis or examining the 
construct validity of the SAT across pc^xilatiois (Rock & Werts, 1979). 
In the current study the test specifications were used as a basis for 
defining both writing skill and item type factors in order to examine the 
internal construct validity of the GED Writing Skills Ttest (Part I) 
across populaticxis. 

This study had three purposes. The first purpose was to use LISREL 
ccxifirnatory factor analysis (CFA) procedures (Joreskog, 1979; Joreskog & 
Sorban, 1983; Long, 1983) to specify and test a series of factor models 
based on the test specifications content of the General Sducaticml 
Development (GED) Writing Skills Test. The writing test coritent included 
three writing skills, or traits, and used three multiple-choice 
item-types, or methods, of measurement. The second purpose ivas to 
evaluate the fit of these models to a set of test data dDtained from a 
national sample of graduating high school seniors. The third purpose was 



to test ths invariance of the best fitting factor structure model in both 
the seniors and the adult high school noi-carpleters. 
Instrument and Data Source 

Ite GED Writing Skills Test is or^ of five tests in the GED test 
battery which were developed to measure the major and lasting outcomes of 
the learning associated with a high school education. Although the high 
school equivalency tests are administered to a diverse group of adults 
who did not conplete a typical four-year high school program of study, 
these tests are standardized on a nationally representative sanple of 
graduating high school seniors in order to provide the score scale and 
the basis for passing requirements, The GED Writing Skills Test has 
recently been revised to reflect changes in high school writing curricula 
vMch place more orphasis on essay writing and proofreading skills than 
in the past (GED Testing Service, 1987), In order to facilitate the 
interpretation of test scores in these groups, it is important to know 
vAiether the test measures tha same writing ccMistructs in both adult high 
school non-completers, for vAion the test is intended, and graduating high 
school seniors, on whan the test is standardized. 

The GED Writing Skills Test has two parts. Part I ccxxtains 50 
multiple-choice itans'' and Part II coisists of a single essay tcpic. 
Data used in the analyses for this study are limited to multiple-choice 
itans f rem the anchor form of the test. The multiple-choice portion of 
the test measures the ability to edit and correct errors within the 
context of one or more par<igraphs of extended discourse. The basis for 

^ In addition t/j 50 multiple-choice itans vdiich are scored. Part I 
of the Writing Skills Test also contains 6 non-scored field test itans. 



ERIC 



3 

5 



the items is a number of selec±icxis of prose, each ccxisisting of one or 
ncre paragraphs. Errors in sentence structure, usage, ai^ mechanics 
occur throughout the selection and reflect the types of errors that 
examinees typically make in their writing. Ihese errors are tested in 
the items that follow each passage. 

In order to minimize problems associated with factoring dichotanous 
data (Gorsuch, 1983), fifteen mini-scale variables were created by 
sumndjig across dichotanous test items of similar content. All multiple- 
choice items were classified by writing specialists according to type of 
editing skill neasured and item type. Each miniscale variable was based 
on 3 to 5 test items vAiich had been judged to measure the same skill 
(trait), using the sane item type (method) (Attachmait A). 

The test was administered to a nationally r^resentative sanple of 
high school saiiors in the spring of 1987 as part of a national 
standardization study conducted by the test develc^)er. For purposes of 
these analyses, only those graduating hi^ school seniors who took the 
anchor form of the Writing Skills Test (N = 2,532) were included. Fran 
January through June of 1988, the test was administered to adult GED 
examinees seeking to qiaalify for a high school equivalency diploma. 
Because the influence of age en factor structure was not the specific 
focus of this stu(^, it was necessary to oaitrol this variable in the 
adiilt GED examinee sample vrtiere ages ranged from 16 to 87. OSierefore, 
only those adult high school non-catpleters betweai the ages 17 and 19 
(N=:699) were i^^cluded in the "young adult" sample for these analyses. 
This age range was selected because it is nearest to the age range of 



graduating high school seniors. For each grcup; covariance matrices of 
assured variables were used as input data for CPA procedures. 
tfethod 

Preliminary exploratory factor analyses applied both principal 
factor and maximum likelihood procedures to the dichotanous item level 
data in order to determine \^ether the items clustered together in a way 
which supported the miniscale item/ variable groupings. SecxDnd, a series 
of f tor models were specified to represent test content. These models 
specified up to three traits ~ proofreading and editing in the areas of 
1) sentaice structure, 2) usage, and 3) mechanics — and up to three 
methods, or item types — 1) sentence correction, 2) sentence revision, 
and 3) construction shift — according to content categories in the test 
specif icati-^ns table of the multiple-choice portion of the GED Writing 
Skills Test (GED Testing Service, 1984? 1987). Confirmatory factor 
analyses were applied to the miniscale level data from the national 
sanple of graduating high school seniors (N = 2532) in order to determine 
the best-fitting model for subsequent multiple-group hypothesis testing. 
Third, using simultaneous CFA for two groups, the best-fitting model was 
tested for goodness of fit to data from both the sample of graduating 
s€n«.^s and the sairple of adult high school non-corpleters. 

CFA procedures . Eleven models were specified and tested using 
confirmatory factor analysis procedures. Model 1 was a null model, with 
no camcn factors. Model 2 hypothesized a single factor influencing all 
fifteen variables. Models 3, 4, 5, and 6 were chosen to represent 
relationships among writing skill caitent factors. Models 7, 8, 9, and 
10 were chosen to represent relationships among the item types factors. 



ERIC 



5 

7 



Model 1 1 represaited a factor ncdel mi(±i included three writing skills 
and three item t^^pes. 



Model 1 , Null Mcdel - 15 uncorrelated factors; 
Model 2 , Onx^ Factor Model? 

tfodel 3 , Two Correlated Trait Factors (Traits 1 and 2 canbined into 
one factor) ; 

Model 4 , Two Correlated Trait Factors (Traits 1 and 3 canbined into 
one facDor) ; 

Model_5. Two Correlated Trait Factors (Traits 2 and 3 canbined into 
one factor) ; 

Model 6 , Three Correlated Trait Factors; 

Model 7 , Two Correlated Method factors (Ifethods 1 and 2 canbined 
into one factor) ; 

Model 8 , Two Correlated Method Factors (Ifethods 1 and 3 canbined 
into one factor) ; 

Model 9 , Two Correlated Method Factors (Methods 2 and 3 canbined 
into on£i factor) ; 

Model 10 , Three Correlated Method Factors; 

Model 11 , Six Factors: three correlated trait factors and three 
correlated methods factors. 



In these single group confirmatory fac±or analyser,^ all latent factors 
were ocxistrained to have unit variances and factor correlations for all 
models were freely estimated. Criteria for model-data fit included chi- 
square difference tests (Joreskog & Sorixm^ 1983) and several indices of 
fit produced as output in the LISREL program: chi-square (prd^ability) 
value, Goodness of Fit index (GFI), Root Mean Square Residual ^RMR), an^? 
Normalized itesiduals (NR). In addition, the Parsimonious Fit Index 



(EPI)2 (James, Mulaik, and Brett, 1982) and ratios of dii-square to 
degrees of freedom^ (Jorsskog & Sorbon, 1979) were used. Finally, 
judgments were made about which of the models provided the most plausible 
and parsimonious representation of the data. 

Because the chi-square value is dependent on sanple size, the chi- 
square probability value in very large saitples may be significant even 
v^en the model represents the data quite well. In small saiiples it nay 
be nai-significant evai for models vAiich are poor. Therefore, in 
analyses based on large sanples, the chi-square probability value can 
lead to rejection of a good model, thereby reducing its usefulness as an 
indicator of goodness of fit. Hayduk (1987) noted that chi-square is 
instructive as an indicator of fit for samples ranging in size fraa about 
50 to 500, aluaough this range may vary depending oa the kind of model to 
be estimated. Samples larger than 500, he observed, require other 
indices of fit. Criteria of fit which may be more relevant are 
sequential tests of incremental differmces in fit, or chi-square 
difference tests, because sudi tests improve infeirence with both large 
and small sanples (Bentler, 1980). Because the differences in chi-square 

2 THiie parsimonious fit index (PFI) is actually Boitler and 
Bainett's (1980) normed fit index modified to take into account the 
nunber of degrees of freedom givai up in order to arrive at a particular 
level of goodness of fit. Tenerolly, the models with the maximum values 
of PFI are those that best describe the data with the fewest unknown 
parameters (Loehlin, 1987). The formula for the PFI, where o refers to 
the null model and k refers to the oanpared model, is: 

(chi-square>v - chi-squara^) 

(dfk/dfo) X 

chi-squareQ 

3 The range for reoomiaided ratios of chi-square/degrees of freedom 
(df) typically are between 2 and 5 (Carmines & Mclver, 1981). 



values are thanselves chi-square statistics^ they can be used to test 

the inpoircance of paraineters that differentiate nested models. 

Invariance Analyses , The model judged to provide the best fit to the 

data f rem the seniors sample was then tested for equality of factor 

structure over sanples of seniors and young adult GED examinees. In each 

analysis^ latent factor variances were freely estimated and indicator 

variables were selected to set the metric for each factor. The following 

invariance hypotheses were tested: 

Equal SIQSH. Equal covariance matrices in both groups. 

Bcfiial k . The factor analysis model has the same structure in both 
graips; i.e.^ the neasured variables load on the same nunber of 
factors in the same pattern in both groups. 

Bcrual LaMBCft . In additiai to the constraints of Equal the factors 
are measured in the same units in both groups; i.e. r equal factor 
loadings. 

Bqual TSSJSi . In addition to previous constraints^ the factors are 
measured with the same accuracy in both groups; i.e.^ error variances 
are the same in both groups. 

BgualFm. In addition to previous constraints^ the variances and 
ccvarances of the latait factors are equal. 

Ir an analysis of variance framework^ whenever the primary purpose of 
raitpling is to make coiparisons across subgroups^ the <^timum sanple is 
one \*iere the sample sizes of the subgroups are equal (Sudman^ 1976). 
Because the sanple sizes for seniors and GED examinees were not only very 
large but also unequal — 2^532 and 699^ respectively^ ~ it was 
anticipated that judgmaits about goodness of fit could be confounded to 
an unknown extent and that cM-square tests of factor structure 
invariance may be hi^y significant regardless of how well the factor 
model represeited the data. In pointing out the limitaticMis of the use 
of the chi-square value as a goodness of fit indicator in large samples^ 

8 

ERiC i 0 



Hayduk reports Hoelter's (1983) recaimendaticai concerning the use of a 
"critical-N" , the sairple size that would be required nake the observed 
differences betweoi the estimated and the observed covariance natrices 
just significant at a typical level such as .OS* After exaniining 
nunierous models, Hoelter su^^sted that a reasonable sanple size out- 
point for CPA hypothesis testing is a critical-N of 200 or more. 
However, because problejns of nonconvergenoe and instances of inproper 
solutions have beoi found for sanple sizes less than 400 (Boctnsna, 1985), 
a critical-N of betweoi 400 and 500 nay be more appropriate* Hayduk 
(1987) pointed out that Hoelter' s decision criterion can be obtained by 
sijTcply inserting the critical-N sanple size into the Tf*TSREL program, 
vising the observed covariance matrix cotputed cn the basis of the actual 
sample size* This approach, in effect, ignores the extra sensitivity or 
precision provided by the extra cases in the sanple* 

In order to consider whether decisions about goodnsss of fit may be 
clarified using this approach, invariance analyses were carried out using 
both the full sanple sizes as well as a critical-N of 400 for each group. 
The critical-N analyses used the original covariance matrices and 
substibated the snaller sanple sizes in the USREL programs. This 
approach was applied in order to estimate the chi-square pr<±ability 
values for samples just large oiou^ to detect meaningful differences — 
i.e., a sanple size for which the chi-square may be considered an 
instructive indicator of fit (Hayduk, 1987), 
Results and Conclusions 

Preliminary EEA . Resalts from both principal fa'±ors and maximum 
likelihood methods supported the view that there was only one noi-trivial 



factor midx accounted for ccnmon variance in the raw da^a. The first 
factor had an eigenvalue of 18.2. While a second factor had an 
eigeivalue slightly over 1.0, only one of the 50 multiple-choice items 
had a loading oa this factor vMch was greater than .30 and this loading 
was only .31 . Finding support for only a single factor at the multiple- 
choice item level suggested that a single factor may also account for 
relatiOTships among the 15 miniscale variables as well. Nevertheloss^ 
for the purpose of conpleteness, all models which 'were proposed were 
tested. 

CFA Procedures , six of the eleven models tested — Models 4, 5, 8, 9, 
10, and 11 ~ produced non-positive definite PHI matrices, making 
interpretation of the results for these models guesticxiable. However, 
based on the single factor resulLs f:ran the EFA, and on the generalized 
nature of proofreading and editing skills the multiple-choice portiai of 
the test was designed to measiare, it was not surprising that some models 
with two or more factors prover? untenable. For each of the ronaining 
five models, all estimated parameters were highly significant and there 
were no instances of improper solutions. Goodness-of-fit results for 
these five models ~ the null model {ItodeL 1 ) , the one factor model 
{Mcx3el 2), models with two correlated traits (Model 3 and Model 5/ and a 
model with two correlated methods (Model 7) ~ are presented in Table 1 . 

Althougji the chi-sg-aare results for these five models were highly 
significant (p < .0005), suggesting that the pressed models do not fit 
the data, the significant chi-square results were expected givei the very 
large sample size for the high school senior group (N = 2532). 



ERIC 



10 

12 



Tlierefore^ other indices of fit were evaluated in order to select the 
best fitting model. 

Each of the two-factor models ~ Model 3^ Model 5^ and Model 7 ~ 
nroduced hi^ly similar goodness of fit indices. Chi--square values 
ranged betweoi 617cl0 and 620.65 with 89 degrees of freedom. For each of 
these three models^ the GFI was .962^ the RMR w^ .023, and the number of 
normalized residuals (NR) greater than 2.0 was 10. Ihe value for the EFI 
was the same for these three models, .82. Based on these goodness of fit 
indices, there appeared to be no discernible differences in these two- 
factor itKriels. The goodness of fit indices for '±e cxie-f actor model ~ 
GFI, RMR, and the number of NR greater than 2.0 ~ were identical to 
those for the two factor models. Because the chi-square value had an 
additiOTal degree of freedom (621.54 with 90 df), the EFI was a slightly 
inproved .83 ccnpared to the PFI of .82 for the other models. TMs 
suggests that the EFI as a goodness-of-fit index is sensitive to the 
advantages of increased parsimony in the one-factor model. When the fit 
indices for a slirpler factor structure appear to be nearly tlie same as 
those for more cociplex structures, the more parsimonious model usually 
provides the better representation of the data. 

An examinatiai of the estimated correlation betweai the two latent 
factors in each of the two-factor models provides additional support for 
the one-factor solucion. For Model 3, Model 5, and Model 7, the 
estiitated correlation betweai lateit factors was .990, .996, and .981, 
respectively. When standard errors for these estimates are taken into 
account, .005, .004, and .009, respectively, the latent factors in each 
tv7o-f actor model are, in effect, perfectly correlated — implying a 

11 



single factor. Basad on evaluatioa of all the CPA results, then, Model 2 
—the one factor model ~ was selected as providing the best, most 
parsimonious, fit to the data for testing subsequent invariance 
hypotheses. 

Invariance analyses . The results of the tests for invariance of the 
one-factor model are presaited in T^le 2. In the first row for each 
hypothesis are results using the fvll sanple sizes for s^ors and 
examinees (K = 2532 and N = 699, respectively). In the second row for 
eadi hypothesis are results based on the critical-N sample sizes (N = 400 
for both groups) . 

When based on the full sample sizes, the chi-sguare values for all 
invariance hypotheses were hi^y significant. Based en the large and 
unequal sample sizes for the two groups, significant cM-square values 
were not une:q)ected. However, the GFI and RMR inproved oxisiderably 
fran the Bqual-SIGMA to the Equal-k hypothesis (fron .951 to .972 for 
ST; frcxa .146 to .026 for RMR), suggesting that the one-factor solution 
provides a satisfactory fit for both groups. For the Bqual-k and Equal 
LAMBDA hypotheses, the GFI results are .972 and .967, respectively, and 
the RMR results are .026 and .044, respectively. Althou^ these indices 
suggest a sli^t decline in fit for the Bqual-LAMBDA model, the 
differences between the fit indices under each hypothesis are very small. 
Indeed, one could argue that these differences are not substantively 
important and that these results in fact support the conclusion that 
factor loadings (lAMBOA) are equal across groups. However, there are no 
known statistical criteria for evaluating how ^arge such differences 
should be in order to judge there iitjxirtant. When the chi-square 

12 

ERIC ^ 



dif feroice test was applied, the result was a statistically significant 
chi-square difference of 32.19 with 14 degrees of freedcxn between the 
Equal-k and the Bqual-LAMBDA hypotheses. If based on statistical 
significance alone, this outcone would indicate that constraining the one 
factor model to equal factor loadings across groups results in a 
significantly poorer fit to the data. Hcwever, the chi-square difference 
of 32.19 r^resents cxily a 4% decrement in fit for the Equal-LAMBDA rtodel 
(806.27) cotpared to the chi-square for Equal-k (774.08), whicii suggests 
that these differences nay not be meaningful. 

In the comparison between Equal-LfiMBDA and Fqual-IHETA, all goodness:- 
of-fit indices suggest a much poorer fit for the Bqual-THBTA ncdel. The 
GET and RMR values becorne poorer (.951 and .050, respectively), the chi- 
square difference test is highly significant (129.12 with 15 df), and the 
chi-square va||Sie represents a 16% poorer fit. Oherefore, the results 
based on the large sample sizes provide support for the Bgual-LAMBDA 
model but not for the Bqual-THETA model. 

If the chi-square test using a large sample size is viewed as 
providing so much power that trivial differences are magnified in 
importance, then it may be instructive to determine \*iether the chi- 
square (probability) test — which is not a useful indicator of fit in 
large sanples ~ produces an indicaticxi of better fit vdien power (sanple 
size) is reduced so that cxily sutetantively meaningful differences are 
detected. Althcuc^ this line of reasaiing has serious shortcomings^^ let 
us consider the goodness-of-fit results based on sanple sizes of 400. 

4 This aj^oach assumris that poor models would be rejected at a 
saitple size of 400. Hbwevt^x, there is no af^arent basis for quantifying 
substantively meaningful differences a priori in terr.iS of sanple size. 

45 



ERIC 



For the analyses based on a critical-N of 400, all invariance tests 
for the one-factor mcdeL produced non-significant chi-square values 
exc^t for the test for an invariant HC mtrix. The non-significant 
dii-square for SK34A (p = .106) suggests that the itatrices of writing 
test data for seniors and (3D examinees are, in effect, equivalent. 
Equivalence at this stage implies that the relationships airong the 
measured indicators have equivalent psychometric prc^jerties across 
groups. Although sudi an outcome inplies invariance of factor structure, 
results f ran subsequent invariance tests are presented for the purpose of 
oonpleteness and in order to evaluate the specific nature of invariance 
in these groups. The ncn-significant chi-square for Bqual-k (p = .380) 
supports tte conclusioti that a one factor model provides a very good fit 
to both sets of data. In addition, the test of invariant LAMBDA (p = 
.390) suggests that the factor loadings in each grocp are equivalent, 
indicating that the observed variables are measuring the same writing 
skill construct in the same metric in both groups. With a critical-N of 
400, the chi-square probability results support the conclusions about 
invariance of factor loadings (Bqual-LAMBDA) vMch were based on 
judgments about the large saitple GFI, RMR, and chi-square differoice 
results. Ihat is, the chi-square probability values associated with the 
tests for invariance of both Equal-k and Equal-LAMBDA indicate very good 
fits to tlie data in both groups. Although the critical-N test for 
equality of TOBTA is non-significant (p = .054), suggesting a good fit, 
it is only just so. A more conservative interpretation would suggest 
that the measures, vAiile highly similar, my not be equally precise in 
both groups. No support was found for equality of variances for the 



writing skill factor^ or PHI^ in both groups (p = .016). An examination 
of the values for the latent factor variance in each group suggests that 
writing skill performance in the senior group^ as measured by the 
multiple-choice writing test^ varies nearly 50% more than in the GED 
examinee group (1.438 and .951^ respectively). 
ConclusiOTis and Innplicaticais. 

Firsts no support was found for the 'f/j.ew that s^)arate itan-type 
nethods facuors accounted for variability of performance on the maLtiple- 
choice portion of the C2D Writing Skills Otest. Second^ the nature of 
writing skill measured by this test appears to be a single construct 
reflecting generalized proofreading or editing skills. That is, 
proofreading and editing writing skills which involve sentence 
structure, usage, and mechanics do not appear to provide systematically 
distinct sources of variation in performance on this test. Third, the 
writing achievement measured by the multiple-choice portion of the GED 
Writing Skills Test has a highly similar factor structure in both high 
school seniors and young adult GED examinees. That is, the vnriting test 
acMevement of young adult GED examinees and high school seniors is based 
on similar types of writing skills and knowledge. Fourth, this study 
provided an empirical, field-based illustration of the use of CFA 
procedures to evaluate the factor structure of multiple-trait, multiple- 
nethod < ata and to test for the invariance of plausible measurenent 
models over ra»altiple groups. 

Both EBA and CFA procedures suggested that a one-factor model 
accounted for coiincai variability in the data. CFA tests of two-factor 
models found estimated correlations between the two latent factors of 



15 

17 



virtual unity. Ohese empirical outcxxnes^ along with substantive 
knowledge about the generalized natiire of the writing skills which the 
test was designed to measure lead to the selection of the one factor 
model as providing the best^ most parsimcxiious fit to the data for the 
SQiiors. 

The goodness of fit indices for the large sample invarianoe analyses 
suggested suf^xDrt for Bqual-k and Bqual-LAMBnA. The relatively high GET 
values (.967 and .972^ respectively) and lew RMR values (.026 and .044^ 
respectively) indicated that most of the rreaningful variation in the data 
had been accounted for in these models. However^ the chi-square 
difference test indicated that constraining the one-factor model to have 
equal factor loadings across groups resulted in a signicantly poorer fit 
to the data^ producing some airibiguity in the decision to retain the 
LAMBDA hypothesis. Because the chi-square for Equal-LAMBDA r^resented 
only a 4% decronait in goodness-of-fit^ the chi-square difference was 
judged not to be meaningful. 

A re-analysis of the invarianoe hypotheses using just large enough, 
but not too large, sanple sizes produced chi-square values for the Equal- 
Sia4A, Equal-k, and Equal-LAMBDA models were highly non- 

significant. If it is reasonable to assiime that samples with a critical- 
N of 400 have sufficient power to detect meaningful differences in the 
factor structure betxveen the two groups, but not trivial ones, then the 
results based on the critical-N analyses support the finding of factor 
structure invarianoe across groups of both seniors and young adult GED 
examinees. Because the critical-N invarianoe analyses were applied to 
the same data on which previous in^/arianoe analyses were based, however. 



it cannot be interpreted as providing a true test of fac±or structure 
invariance. It is reoocnmended that the one-factor model be cross- 
validated using new sets of data for seniors and young adult GED 
examinees, with sanple sizes of at least 400 but no more than 500, in 
order to provide an ind^)endent test of the invariance of the cne-factor 
model. In addition, future research should address the relationship 
between sample sizes based on the concept of a critical-N and judgments 
of what constitutes meaningful differences in model-data fit. 

Because these invariance analyses included only those GED examinees 
similar in age to the high school seniors, it is recannended that similar 
studies be undertaken to determine if the one-factor model provides a 
good fit to data for older GED examinees as well. Future studies should 
also include direct measures writing skill (essay) in order to determine 
the relationship between both indirect and direct measures of writing 
performance in seniors and GED examinees. 

Although MUm data have been examined using the Campbell-Fiske 
criteria (Campbell & Fiske, 1959) and an ANOVA nvDdel (Kavanagh, McKinney 
& Wolins, 1971; Stanley, 1961), CPA models have been found to provide 
better tests of these matrices without the limitations inherent in the 
other approaches (Marsh & Hocevar, 1983; Werts, Joreskog, & Linn, 1972). 
This anpirical study of writing test data from high school seniors end 
young adult high school non-corrpleters not only illustrated the 
application of a useful methodology for examining data, but it also 
contributed to our understanding of writing skill, a construct which is 
becoming increasingly iitportant to both researchers and practitioners. 
Finally, the study addressed an inportant gap in the research literature 

17 



by evaluating the factor structure of writing skills in c population 
about v*iatn relatively little is known, young adults who have dropped out 
of high school. 



20 

18 



4 



References 

Anderson, J.C. and Garbing, D.W. 1984. The effect of sampling error on 
ccnvergenoef iirproper solutions , and goodness of fit indices for 
maximim likelihood confirnatory factor analysis. Psychonetrika r 49, 
155-173. ~ 

Bentler, P.M. 1980. Multivariate analysis vath latent variables: Causal 
modeling. In M.R. Rosenzweig and L.W. Porter (Eds)o, Annual review 
of psychology ^ 31, 419-456. 

Baitler, PoM^ and Bonett, D.G. 1980. Significance tests and gocxiness of 
fit in the analysis of covariance structures. Psychological b ulletin, 
88, 588-606. 

Birenbaum, M. and Tatsuoka, K.K. 1982. On the dimensiaiality of 

adiieveroent test data. Journal of educational measuranent , 19, 259- 
266. ~ 

Bocrasraa, A. 1982 The robustness of LISREL against small sanple sizes in 
factor analysis models. In K.G. Joreskog & Wold (Eds.), Systems under 
indirect observatiai; Causality, structure, prediction (Part 1, pp. 
149-173). Amsterdam: North-Holland. 

Boansraa, A. 1985. Nonccxwergence, iirproper solutions, and starting values 
in LISREL maximum likelihood estimation. Psychonetrika . 50, 229-242. 

CaiTvbell, D. and Fiske, D. 1959. Convergent and discriminant validation 
by the multitrait-multimethod matrix. Psychological bu lletin > 56, 
81-105. ~ 

Carmines, E. and Mclver, J. 1981. Analyzing models with undDserved 

variables: Analysis of covariance structures. In G. Bohrenstedt and E. 
Borgatta (Eds.), Social measureorent: Current issues . Beverly 
Hills :Sage. 

General Educational Development Testing Service. 1984. GED Tests 

specificatiois ccrnmittee report . Washington, D.C.:American Council on 
Education. 

General Educational Developnent Ttesting Service. 1987. The official 
teacher's guide to the Tests of General Educaticml Develorment . 
Washington, D.C. :American Council on Education. 

Gorsuch, R.L. 1983. Factor analysis (2nd ed.). Hillsdale, N.J.:Lawr€noe 
Erlbaum Associates. 

Hayduk, L.A. 1987. Structural equation modeling with LISREL: Essentials 
and advances . Baltimore, Md.:John Hopkins University Press. 

Hoelter, J.W. 1983. The analysis of covariance structures: Goodness- 
of -fit indices. Sociological methods and research , 11_, 325-344. 



ERIC 



19 

21 



James, Mulaik, S.A. , ar? Br^^tt, J.M, 1982. Causal analysis; 

Assumptions^ models , and date '. Beverly HillsrSage* 

Joreskog, K. 1979. Analyzing psyci>ological data by structural analysis 
of covariance matrices* In K^G. Joreskog and D. Sorbom (Eds*), 
Advances in factor analysis and str ac ^al equation models . Cambridge, 
Mass* :Abt. 

Joreskog, K.G., L Sorbcm, D. 1983. JJiSS EL: Analysis of linear structural 
relaticxiships by the method of maxi^ irom l ikelihood. User's Guide . 
Versions V and VI (2nd ed.)* Chicago: National Bducatioial Resources, 
Inc* 

Kavanagh, M.J., Mackinney, A,C., & WDlins, L. 1971 • Issuos in managerial 
performance: Multitrait-raultimethod analyses of ratings. Psychological 
bulletin , 75, 34-49. 

Loehlin, J.C. 1987. Latent variable models: An introductiai to factor, 
path^ and structural analysis . Hillsdale, N.j'7:LawrQice Erlbaum 
Associates. 

Long, J. 1983. Ccxifirmatory factor analysis. Quantitative applications 
in the social sciences . Beverly Hills: Sage. 

Marsh, H.W. & Hocevar, D. 1983. Confirmatory factor analysis of 

raultitrait-multimethod matrices. Journal of educational measurement , 
20 , 231-248. 

Phillips, S.E. and Mshrens, W.A. 1987. Curricular differences and 

unidanensionality of achievement test data: An exploratory analysis. 
Journal of educational measurarient , 24, 1-16. 

Rock, D.A. anc' Werts, C.E. 1979. Construct validity of the SAT across 
populaticais — An empirical confirmatory study . (Report No. RR-79-2). 
Princeton, N. J. :Bducational Testing Service. 

Stanley, J.C. 1961. Analysis of unreplicated three way classificaticns 
with applicaticn to rater bias and trait ind^)ende2ice. Psychanetrika r 
26, 205-219. 

Werts, C.E., Joreskog, K., and Linn, R. 1972. A multitrait-multimethod 
model for studying grcwth. Educational and psychological measurement s 
32, 655-678. 



2 2 



20 



T3able 1 . Goodness of Fit Statistics for CFA Trait and Method 
Models of Writing Skills of Hi^ Sdiool Seniors 
as Measured by the QED Writing Skills Test-Eart I 





Chi-scaiare df 




QFI 


HI® 


NR>2.0 




1. Null 


20 


,617.70 


105 


".96.35 


.217 


.504 


all 




2. 1 Trait 




621 .54 


90 


6.91 


.962 


.023 


10 


.83 


3. 2 Traits 

(1 & 2 oocnbined) 


617.86 


89 


6.94 


.962 


.023 


10 


.82 


5. 2 Traits 

(2 & 3 catibined) 


620.65 


89 


6.97 


.962 


.023 


10 


.82 


7. 2 Methods 
(1 & 2 ocinbined) 


617.10 


89 


6.93 


.962 


.023 


10 


.82 



UaKle 2. Eactor Structure Invarlancie Over Saii()les of 

Graduating Hi^ School Seniors and Young Mult QED Examinees: 
Sannary of Fit Statistics 



Hypothesis N Chi-square (df ) GFI ms. 

SIGMA 2532/699 397.89 (120) .951 .146 

400/400 139.70 (120) .980 .093 



Chi-square 

/df P-value 



3.32 
1.16 



.000 
.106 



2532/699 774.08 (180) .972 .026 4.30 

400/400 185.18 (180) .972 .026 1.03 



.000 
.380 



LAMBDA 2532/699 806.27 (194) .967 .044 4.16 

400/400 198.87 (194) .969 .037 1.02 



.000 
.390 



THETA 2532/699 935.39 (209) .951 .050 4.48 

400/400 242.94 (209) .963 .039 1.16 



.000 
.054 



PHI 



2532/699 970.08 (210) .948 .147 4.62 

400/400 256.21 (210) .961 .095 1.22 



.000 
.016 



21 2 3 



Attachment A 

Writing SkiU/ltem Type Cfeitegaries of Items for 
15 Miniscale Variable from the GED Writing Skills Test, Bart I 



Vairlable ttpne 

Variable 01 : 

Variable 02 

Variable 03 

Variable 04 

Variable 05 

Variable 06 
Variable 07 
Variable 08 
Variable 09 
Variable 10 
Variable 11 
Variable 12 
Variable 13 
Variable 14 
Variable 15 



NuDDber of Multiple- Skill/Item lype Categories 
Cixaioe Items 



3 
3 



3 
3 
3 
3 
3 
3 
4 
3 
3 
5 



Soitence Strucv'xire/ 
SentQice Correctiai 

Sentence Structure/ 
Sentence Correction 

Sentence Structure/ 
Sentence Revision 

Soitence Structure/ 
Sentence Revision 

Soitence Structure/ 
Construction Shift 

Usage/Sentence Correctiai 

Usage/Sfe. toice Correction 

Usage/Soitence Correction 

Usage/Sentence Revision 

Usage/Sentence Revision 

Usage/Oonstruction Shift 

Mechanics/Sentence Correctiai 

Mechanics/Sentence Correction 

Mechanics/Soitence Correction 

Mechanics/Sentence Revision 



ERIC 



24 

22 



